Dynamic and Approximate Pattern Matching in 2D
نویسندگان
چکیده
We consider dynamic and online variants of 2D pattern matching between an m×m pattern and an n× n text. All the algorithms we give are randomised and give correct outputs with at least constant probability. – For dynamic 2D exact matching where updates change individual symbols in the text, we show updates can be performed in O(log n) time and queries in O(log m) time. – We then consider a model where an update is a new 2D pattern and a query is a location in the text. For this setting we show that Hamming distance queries can be answered in O(logm + H) time, where H is the relevant Hamming distance. – Extending this work to allow approximation, we give an efficient algorithm which returns a (1 + ε) approximation of the Hamming distance at a given location in O(ε−2 log m log log n) time. Finally, we consider a different setting inspired by previous work on locality sensitive hashing (LSH). Given a threshold k and after building the 2D text index and receiving a 2D query pattern, we must output a location where the Hamming distance is at most (1+ε)k as long as there exists a location where the Hamming distance is at most k. – For our LSH inspired 2D indexing problem, the text can be preprocessed in O(n log n) time into a data structure of size O(n) with query time O(nm).
منابع مشابه
The $\mathcal{E}$-Average Common Submatrix: Approximate Searching in a Restricted Neighborhood
This paper introduces a new (dis)similarity measure for 2D arrays, extending the Average Common Submatrix measure. This is accomplished by: (i) considering the frequency of matching patterns, (ii) restricting the pattern matching to a fixed-size neighborhood, and (iii) computing a distance-based approximate matching. This will achieve better performances with low execution time and larger infor...
متن کاملApproximate String Matching with Ordered q-Grams
Approximate string matching with k differences is considered. Filtration of the text is a widely adopted technique to reduce the text area processed by dynamic programming. We present sublinear filtration algorithms based on the locations of q-grams in the pattern. Samples of q-grams are drawn from the text at fixed periods, and only if consecutive samples appear in the pattern approximately in...
متن کاملApproximate string matching as an algebraic computation
Approximate string matching has a long history and employs a wide variety of methods (see e.g. the survey [2]). We consider a variant of approximate matching that compares a fixed pattern string to every substring in the text string by a rational-weighted edit distance (e.g. the indel distance, defined as the number of character insertions and deletions, or the indelsub/Levenshtein distance, wh...
متن کاملA Comparative Study of Different Longest Common Subsequence Algorithms
The longest common subsequence is a classical problem which is solved by using the dynamic programming approach. The LCS problem has an optimal substructure: the problem can be broken down into smaller, simple "subproblems", which can be broken down into yet simpler subproblems, and so on, until, finally, the solution becomes trivial. The LCS problem also has overlapping subproblems: the soluti...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کامل